47 research outputs found
Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
Topic models extract meaningful groups of words from documents, allowing for
a better understanding of data. However, the solutions are often not coherent
enough, and thus harder to interpret. Coherence can be improved by adding more
contextual knowledge to the model. Recently, neural topic models have become
available, while BERT-based representations have further pushed the state of
the art of neural models in general. We combine pre-trained representations and
neural topic models. Pre-trained BERT sentence embeddings indeed support the
generation of more meaningful and coherent topics than either standard LDA or
existing neural topic models. Results on four datasets show that our approach
effectively increases topic coherence
Cross-lingual Contextualized Topic Models with Zero-shot Learning
Many data sets (e.g., reviews, forums, news, etc.) exist parallelly in
multiple languages. They all cover the same content, but the linguistic
differences make it impossible to use traditional, bag-of-word-based topic
models. Models have to be either single-language or suffer from a huge, but
extremely sparse vocabulary. Both issues can be addressed by transfer learning.
In this paper, we introduce a zero-shot cross-lingual topic model. Our model
learns topics on one language (here, English), and predicts them for unseen
documents in different languages (here, Italian, French, German, and
Portuguese). We evaluate the quality of the topic predictions for the same
document in different languages. Our results show that the transferred topics
are coherent and stable across languages, which suggests exciting future
research directions.Comment: Updated version. Published as a conference paper at EACL202
In-Context Learning User Simulators for Task-Oriented Dialog Systems
This paper presents a novel application of large language models in user
simulation for task-oriented dialog systems, specifically focusing on an
in-context learning approach. By harnessing the power of these models, the
proposed approach generates diverse utterances based on user goals and limited
dialog examples. Unlike traditional simulators, this method eliminates the need
for labor-intensive rule definition or extensive annotated data, making it more
efficient and accessible. Additionally, an error analysis of the interaction
between the user simulator and dialog system uncovers common mistakes,
providing valuable insights into areas that require improvement. Our
implementation is available at
https://github.com/telepathylabsai/prompt-based-user-simulator
FashionCLIP: Connecting Language and Images for Product Representations
The steady rise of online shopping goes hand in hand with the development of
increasingly complex ML and NLP models. While most use cases are cast as
specialized supervised learning problems, we argue that practitioners would
greatly benefit from more transferable representations of products. In this
work, we build on recent developments in contrastive learning to train
FashionCLIP, a CLIP-like model for the fashion industry. We showcase its
capabilities for retrieval, classification and grounding, and release our model
and code to the community.Comment: Code will soon be available at https://github.com/patrickjohncyh,
dataset at https://github.com/Farfetc
OCTIS: Comparing and optimizing topic models is simple!
In this paper, we present OCTIS, a framework for training, analyzing, and comparing Topic Models, whose optimal hyper-parameters are estimated using a Bayesian Optimization approach. The proposed solution integrates several state-of-the-art topic models and evaluation metrics. These metrics can be targeted as objective by the underlying optimization procedure to determine the best hyper-parameter configuration. OCTIS allows researchers and practitioners to have a fair comparison between topic models of interest, using several benchmark datasets and well-known evaluation metrics, to integrate novel algorithms, and to have an interactive visualization of the results for understanding the behavior of each model. The code is available at the following link: https://github.com/MIND-Lab/OCTIS
Hyperoxemia and excess oxygen use in early acute respiratory distress syndrome : Insights from the LUNG SAFE study
Publisher Copyright: © 2020 The Author(s). Copyright: Copyright 2020 Elsevier B.V., All rights reserved.Background: Concerns exist regarding the prevalence and impact of unnecessary oxygen use in patients with acute respiratory distress syndrome (ARDS). We examined this issue in patients with ARDS enrolled in the Large observational study to UNderstand the Global impact of Severe Acute respiratory FailurE (LUNG SAFE) study. Methods: In this secondary analysis of the LUNG SAFE study, we wished to determine the prevalence and the outcomes associated with hyperoxemia on day 1, sustained hyperoxemia, and excessive oxygen use in patients with early ARDS. Patients who fulfilled criteria of ARDS on day 1 and day 2 of acute hypoxemic respiratory failure were categorized based on the presence of hyperoxemia (PaO2 > 100 mmHg) on day 1, sustained (i.e., present on day 1 and day 2) hyperoxemia, or excessive oxygen use (FIO2 ≥ 0.60 during hyperoxemia). Results: Of 2005 patients that met the inclusion criteria, 131 (6.5%) were hypoxemic (PaO2 < 55 mmHg), 607 (30%) had hyperoxemia on day 1, and 250 (12%) had sustained hyperoxemia. Excess FIO2 use occurred in 400 (66%) out of 607 patients with hyperoxemia. Excess FIO2 use decreased from day 1 to day 2 of ARDS, with most hyperoxemic patients on day 2 receiving relatively low FIO2. Multivariate analyses found no independent relationship between day 1 hyperoxemia, sustained hyperoxemia, or excess FIO2 use and adverse clinical outcomes. Mortality was 42% in patients with excess FIO2 use, compared to 39% in a propensity-matched sample of normoxemic (PaO2 55-100 mmHg) patients (P = 0.47). Conclusions: Hyperoxemia and excess oxygen use are both prevalent in early ARDS but are most often non-sustained. No relationship was found between hyperoxemia or excessive oxygen use and patient outcome in this cohort. Trial registration: LUNG-SAFE is registered with ClinicalTrials.gov, NCT02010073publishersversionPeer reviewe
Pre-training is a hot topic: contextualized document embeddings improve topic coherence
No abstract availabl